Query Language for Access to Speech Corpora
نویسندگان
چکیده
With more and more speech corpora at hand the unit selection technique is a promising approach in concatenative speech synthesis. What is missing are models of optimal parameters that sufficiently describe utterances to be produced and their corresponding counterparts in collections of speech data. Prior to this, existing corpora have to be annotated on possibly relevant linguistic and signal levels. This paper deals with standards developed in the MATE project for the uniform annotation of speech corpora to be represented in XML and a query language which can access these corpora. These standards may accelerate the identification of optimal elements for the annotation and description of parameters relevant for the unit selection technique.
منابع مشابه
Query Language for Research in Phonetics
With the growing availability of spoken language corpora more and more data driven research in phonetics is possible. The downside of having huge speech corpora is that they have to be segmented and labeled, before they can be exploited. As labeling and annotation are time-consuming and costly, there is an interest in standardization which would support the exchange and reuse of labeled data. T...
متن کاملانتخاب مناسبترین زبان پرسوجو برای استفاده از فراپیوندها جهت استخراج دادهها در حالت دیتالوگ در سامانه پایگاه داده استنتاجی DES
Deductive Database systems are designed based on a logical data model. Data (as opposed to Relational Databases Management System (RDBMS) in which data stored in tables) are saved as facts in a Deductive Database system. Datalog Educational System (DES) is a Deductive Database system that Datalog mode is the default mode in this system. It can extract data to use outer joins with three query la...
متن کاملLarge Linguistically-Processed Web Corpora for Multiple Languages
The Web contains vast amounts of linguistic data. One key issue for linguists and language technologists is how to access it. Commercial search engines give highly compromised access. An alternative is to crawl the Web ourselves, which also allows us to remove duplicates and nearduplicates, navigational material, and a range of other kinds of non-linguistic matter. We can also tokenize, lemmati...
متن کاملgraphANNIS: A Fast Query Engine for Deeply Annotated Linguistic Corpora
We present graphANNIS, a fast implementation of the established query language AQL for dealing with deeply annotated linguistic corpora. AQL builds on a graphbased abstraction for modeling and exchanging linguistic data, yet all its current implementations use relational databases as storage layer. In contrast, graphANNIS directly implements the ANNIS graph data model in main memory. We show th...
متن کاملXSLT as a Linguistic Query Language
Introduction As the number of natural language applications being developed increases, so does the need for a good linguistic database management system. A linguistic database, more commonly known as a corpus, is a collection of linguistic data, either of written text or as a transcription of recorded speech. They are designed to be a balanced collection of data that represent some aspect of a ...
متن کامل